Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Local features-based script recognition from printed bilingual document images

Identifieur interne : 000775 ( Main/Exploration ); précédent : 000774; suivant : 000776

Local features-based script recognition from printed bilingual document images

Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]

Source :

RBID : Pascal:11-0056451

Descripteurs français

English descriptors

Abstract

Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">11-0056451</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 11-0056451 INIST</idno>
<idno type="RBID">Pascal:11-0056451</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000153</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000620</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000149</idno>
<idno type="wicri:doubleKey">0952-8091:2010:Abirami S:local:features:based</idno>
<idno type="wicri:Area/Main/Merge">000780</idno>
<idno type="wicri:Area/Main/Curation">000775</idno>
<idno type="wicri:Area/Main/Exploration">000775</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Alphabet</term>
<term>Bilingualism</term>
<term>Character recognition</term>
<term>Decision tree</term>
<term>Document analysis</term>
<term>Grid</term>
<term>Hierarchical classification</term>
<term>Image processing</term>
<term>Modeling</term>
<term>Multilingualism</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Printed document</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance forme</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Analyse documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Traitement image</term>
<term>Grille</term>
<term>Arbre décision</term>
<term>Document imprimé</term>
<term>Multilinguisme</term>
<term>Bilinguisme</term>
<term>Classification hiérarchique</term>
<term>Alphabet</term>
<term>Modélisation</term>
<term>Evaluation performance</term>
<term>52477</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Multilinguisme</term>
<term>Bilinguisme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Inde</li>
</country>
</list>
<tree>
<country name="Inde">
<noRegion>
<name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000775 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000775 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:11-0056451
   |texte=   Local features-based script recognition from printed bilingual document images
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024